AITopics | nullnull 2

In this paper, there are some equivalent forms of the generalization error we will study, e.g., Eq. (2) This lemma is a consequence of Lemma 2.1, with further utilizing some symmetric properties. Recall Eq. (1) in Lemma 2.1, E Note that Eq. (2) in the main text is from the second equation above, which is used to derive individual Notice that we do not change the definitions of any the random variable, e.g., This, as we have already seen in Eq. (5) in the main text, is used to derive hypotheses-conditioned CMI bounds in Section 4. It's easy to see that when To obtain Eq. (14), we let W This is used to derive supersample-conditioned CMI bounds in Section 4. It's easy to see that both Like all the previous information-theoretic bounds, the following lemma is widely used in our paper. We also invoke some other lemmas as given below. It's easy to verify that We note that the reason we introduce four types of SCH stability in Definition 2.1 is that solely using The basic set up is as follows. By Lemma A.3, we have E Recall Eq. (12) in Lemma A.1 and applying Jensen's inequality to the absolute function, the first The proof is nearly the same to the proof of Theorem 3.1, except that now the randomness of the algorithm is given for each DV auxiliary function, so the randomness of Similar to the proof of Theorem 3.1, we let We now prove the first bound. Lemma A.2, we have E By Lemma A.3, we have E Recall Eq. (14) in Lemma A.1 and by Jensen's inequality for the absolute function, the first bound is To prove the second bound, we return to Eq. (20), and take expectation over For the second part of Theorem 4.1, notice that it's valid to let The proof is similar to [18, Theorem 2.1].

artificial intelligence, inequality, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage

Liu, Haolin, Snyder, Braham, Wei, Chen-Yu

arXiv.org Machine LearningFeb-13-2026

We study offline reinforcement learning under $Q^\star$-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative $Q$-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are $Q^\star$-realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative by establishing an information-theoretic lower bound. Going substantially beyond this, we introduce a general framework that characterizes the intrinsic complexity of a given $Q^\star$ function class, inspired by model-free decision-estimation coefficients (DEC) for online RL (Foster et al., 2023b; Liu et al., 2025b). This complexity recovers and improves the quantities underlying the guarantees of Chen and Jiang (2022) and Uehara et al. (2023), and extends to broader settings. Our decision-estimation decomposition can be combined with a wide range of $Q^\star$ estimation procedures, modularizing and generalizing existing approaches. Beyond the general framework, we make further contributions: By developing a novel second-order performance difference lemma, we obtain the first $ε^{-2}$ sample complexity under partial coverage for soft $Q$-learning, improving the $ε^{-4}$ bound of Uehara et al. (2023). We remove Chen and Jiang's (2022) need for additional online interaction when the value gap of $Q^\star$ is unknown. We also give the first characterization of offline learnability for general low-Bellman-rank MDPs without Bellman completeness (Jiang et al., 2017; Du et al., 2021; Jin et al., 2021), a canonical setting in online RL that remains unexplored in offline RL except for special cases. Finally, we provide the first analysis for CQL under $Q^\star$-realizability and Bellman completeness beyond the tabular case.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2602.12107

Country:

North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

b733cdd80ed2ae7e3156d8c33108c5d5-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 13:17:16 GMT

bayesian regret, information ratio, mdp, (14 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

4b6898c70d5b328deaf2216aefd8f77a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 11:47:05 GMT

learning, proceedings, semi-supervised learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.67)

Add feedback

2c8047bf3ed8ef6905351608d641f02f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 06:45:05 GMT

assumption 4, dataset, enull null null null 1, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > Erie County > Buffalo (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.64)

Add feedback

6fe6a8a6e6cb710584efc4af0c34ce50-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 07:37:04 GMT

baseline, nullnull, var, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

A Reduction to no Memory Proofs

Neural Information Processing SystemsFeb-8-2026, 16:44:36 GMT

We first need the following lemma, which bounds the prediction shifts and magnitudes of Algorithm 2. See proof in Appendix A.2. We are now ready to prove Theorem 9. Proof of Theorem 9. We show that Algorithm 2 achieves the desired regret bound. Lipschitz) where the last transition used the Lipschitz assumption to bound the gradient. This concludes the second part of the lemma. We give a general example of a BCO algorithm that may be employed in conjunction with our reduction procedure given in Algorithm 2. For a positive semi-definite matrix Moreover, for all null null we have that 1. if null The proof of Lemma 15 relies on a few standard results.

artificial intelligence, machine learning, nullnull 2, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

Filters

Collaborating Authors

nullnull 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

18fee39e2666f43cf44425138bae9def-Supplemental-Conference.pdf

Unveiling the Potential of Robustness in Selecting Conditional Average Treatment Effect Estimators

9b3c956210432122aa5385c355fc7c6e-Paper-Conference.pdf

Appendices A Some Useful Lemmas

On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage

b733cdd80ed2ae7e3156d8c33108c5d5-Supplemental-Conference.pdf

4b6898c70d5b328deaf2216aefd8f77a-Paper-Conference.pdf

2c8047bf3ed8ef6905351608d641f02f-Paper-Conference.pdf

6fe6a8a6e6cb710584efc4af0c34ce50-Supplemental.pdf

A Reduction to no Memory Proofs